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ABSTRACT 



An emphasis on therapy research and empirically supported 
treatments (ESTs) is currently suggested as the answer to many therapists' 
and counselors' problems with behavioral healthcare. Those eager to protect 
psychologists ' share of behavioral care in an era of managed care are 
utilizing ESTs and process research to demonstrate the effectiveness of their 
clinical interventions. At an increasing rate, therapists are being told that 
their future in delivering behavioral healthcare depends upon their ability 
to apply the knowledge gained from several decades of psychotherapy research. 
In trying to prepare students for their professional lives as counselors in 
an era dominated by managed care, counseling education programs . are 
increasingly emphasizing the treatment techniques supported by empirical 
research. This paper reviews the limitations of the extant efficacy research. 
It discusses the challenges of providing managed care psychotherapy and the 
limitations of ESTs. It debates the issue of research therapy versus clinical 
therapy and warns against an oversimplification of treatment standards. In 
conclusion, the paper suggests that integrating the findings from 
naturalistic studies can foster more accurate understanding of the everyday 
challenges confronting clinicians in the field. (Contains 24 references.) 
(Author/JDM) 
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Abstract 

Emphasis on therapy research and empirically supported 
treatments (ESTs) is currently being touted as the answer to many 
therapists' and counselors' in the current behavioral healthcare 
marketplace. Utilizing treatment outcome and process research to 
demonstrate the effectiveness of clinical interventions is being 
encouraged by those eager to protect psychologists ' share of 
behavioral healthcare in the managed care era. Far more so than 
was previously the case, therapists are being told that their 
future in successfully delivering behavioral healthcare depends 
upon their ability to apply the knowledge gained from several 
decades of psychotherapy research. In trying to prepare students 
for their professional lives as counselors and therapists in a 
managed care dominated world, counseling education programs 
increasingly emphasize the treatment techniques supported by 
empirical research. This paper reviews the limitations of the 
extant efficacy research. 

The Challenges of Providing Managed Care Psychotherapy 

Generally, psychotherapy outcome research is consistent with 
the possibility of providing effective therapy within a managed 
care framework that emphasizes brief, circumscribed therapy. 
However, wildly varying utilization and quality control practices 
stibsumed by the label "managed care" make it challenging to 
generalize about new practice requirements. 

Nonetheless, providers have been encouraged to prepare 
themselves by mastering certain essential skills. Competence in 
delivering ESTs (Division 12 Task Force on Promotion and 
Dissemination of Psychological Procedures, 1995; Chambless et 
al . , 1996) for specific disorders is being encouraged by training 
programs, continuing education workshops, and professional 
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organizations. However, before it is assumed that students should 
learn only the empirically validated therapeutic approaches, it 
is important to examine the limitations of the current empirical 
research and to consider how these limitations constrain the 
general izability of many of the available therapy experiments. 

Limitations of ESTs 

While the author strongly supports the growing emphasis on 
empirical justification of clinical practices, it is also 
important not to overlook the limited representativeness of the 
randomized control trial (RCT) paradigm that provides the basis 
for the current listing of ESTs. Efficacy studies often 
generalize poorly to real life therapeutic situations, yet they 
are increasingly used to constrain and curtail real life 
provision of treatment . In an indictment of the current emphasis 
on RCTs over more naturalistic studies, Seligman and Levant 
(1998) argue that "...efficacy studies can by their very nature 
"validate" only brief, simple, and inexpensive treatment. 

Efficacy studies cannot test longer and more complicated 
modalities much less "validate" them. . .Efficacy researchers have 
become the unwitting vehicle for short-changing patients in need 
of more than brief therapy -- on a massive scale ." (p . 212) 

This article summarizes some of the major limitations of 
efficacy studies. It also explores how these problems inherently 
limit our ability to rely exclusively on ESTs as a means of 
operationalizing best clinical practices. 

The Irony of Psychotherapy Outcome Research: 

The "Best" Controlled Research is Often Least Representative 

Debate about the generalizability of therapy research has a 
long history, beginning with questions about the usefulness of 
therapy analogue research, in which volunteers who would not 
normally seek professional help are placed in therapy-like 
situations. In comparison with actual clinical contexts, 
analogue studies generally address less intense and disruptive 
problems, among patients who are less disturbed, using therapists 
who are less experienced (Kazdin & Wilcoxon, 1976; Kazdin, 1978, 
1986; Marks, 1978; Rosen, 1975) . 

Analogue research permits experimental control of variables, 
but its artificiality limits its findings' real world relevance. 
Similarly, many have challenged the generalizability of clinical 
efficacy studies due to their lack of realism, even though they 
do use actual patient populations. Crits-Christoph (1992) argued 
that the generalizability of outcome results from studies where 
therapists adhered to fairly rigid protocols to the efficacy of 
eclectic psychotherapy as it is ordinarily practiced is not 
clear. Seligman and Levant (1998) share this concern, citing 
discrepancies between the conclusions of efficacy and 
effectiveness studies. 

As Shadish et al . (1997) noted, in therapy experiments there 

is inevitably a tradeoff between features that enhance internal 
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validity and those that optimize external validity. "Features of 
studies that facilitate optimal causal inference (e.g., a 
population that will accept random assignment) threaten 
generalization (e.g., patients willing to be randomly assigned 
may differ from the population of interest) ." (p.362) 
Understanding the distinctions between treatments as delivered in 
typical controlled efficacy studies and those provided "in the 
field" can improve our ability to apply these findings 
appropriately . 



Research Therapy vs Clinic Therapy 

The differences between "research" and "clinic" therapies 
emphasized by Weisz, Weiss, and Donenberg (1992) are: (1) In 

research therapy, participants are actively recruited by the 
experimenter, while in clinic therapy they are self-ref erred or 
referred by others; (2) in research therapy, patients are 
generally more homogenous than the diverse array of clients 
treated in clinic therapy; (3) in research therapy, treatment 
usually addresses one main problem, while clinic therapy 
typically addresses clients with a blend of difficulties; (4) in 
research therapy, therapists are usually recently trained in the 
specific procedure being investigated, which typically does not 
occur in clinic therapy; (5) in research therapy, the therapist 
is instructed to exclusively use the protocol under study, while 
in clinic therapy the therapist is not similarly constrained; and 
(6) in research therapy, a treatment manual is often used and 
treatment implementation is monitored, which rarely happens in 
clinic therapy. 

In addition to these distinctions, other aspects of therapy 
as assessed in efficacy studies may limit our confidence in their 
applicability to actual practice. The following section explores 
several other concerns about psychotherapy outcome research. 

1 . The informed consent procedures required in research may 
introduce doubts about the treatment that have no direct parallel 
in real world settings . Typically, informed consent forms alert 
the patient to the possibility that they are not receiving the 
"real" treatment, which could be distracting or could detract 
from the quality of the therapeutic alliance and the patient's 
faith in their therapist . This could reduce the impact of 
treatments studied in these ways. 

2 . The process of random assignment to treatments does not 
represent the process through which patients usually select a 
therapist. Actual patients in psychotherapy often get there by 
actively "shopping", rather than being randomly assigned to a 
particular therapist or treatment condition. Patients are often 
free to choose from among a variety of therapists offering 
different styles of treatment . If the match between therapist 
personality and patient personality is important for treatment 
success, the use of random assignment may universally deflate the 
observed impact of psychotherapy. The credibility of a treatment 
is iir^jortant in establishing the patient's expectancy of 
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improvement (a known contributor to patient progress) . When 
patients are free to choose their treatment, it seems likely that 
the method has greater credence (if for no other reason than 
cognitive dissonance: "I wouldn't have picked this therapy if it 
weren't the best for me") . The Hester et al . (1990) study of 

alcoholism treatment provided evidence that compatibility of a 
treatment approach with the client's beliefs before entering 
therapy can heighten therapy success. 

However, in actual practice, most patients are only modestly 
informed consumers. Many never carefully research their 
therapists before signing on. In addition, their freedom to 
choose is increasingly being limited by managed care policies. 
Therefore, this distinction may be of limited relevance. 

3. Extraneous confounds are a problem in many outcome studies, 
despite randomization procedures. Because of the multitude of 
uncontrolled variables that potentially influence outcome, the 
sample size of any particular study is never sufficient to ensure 
that even with random assignment groups will end up equivalent 
with regard to all possible confounds. Therefore, even when 
statistically significant outcome differences between treatment 
groups are obtained, we often cannot be sure that these 
differences are due to the treatment variables under study, 
rather than to inadvertently confounded factors . 

Various patient factors can compromise the unambiguous 
interpretation of clinical studies. In a placebo- control study on 
the drug Clofibrate (Coronary Drug Project Research Group, 1980) , 
used to treat coronary heart disease, patient compliance proved 
the best predictor of longevity. Patients who took medication as 
directed at least 80% of the time had a lower mortality rate than 
those who were less compliant (15% versus 25% mortality within 
the five years following initiation of drug treatment) , 
independent of whether the patients were taking the active 
medication or placebo. This illustrates the need to hold such 
patient factors constant across experimental and control groups, 
but often they are not even assessed. 

Selective attrition is an especially common and vexing 
source of confounding in outcome research. In conducting research 
with patients, it is virtually impossible to avoid missing data, 
because patients routinely fail to provide complete information 
and often fail to complete treatment regimens. Missing data 
compromise random assignment, reducing the controlled clinical 
trial to the status of a flawed, quasi -experiment . 

Not all researchers seem to appreciate the seriousness of 
this problem, and report data from studies with substantial 
differential drop-out across treatment conditions as if this were 
not germane to the question of relative treatment effectiveness. 
Differential drop-out should temper our cohcTusiohs about 
particular treatment approaches; the best treatment in the world 
is of little value if no one is willing to use it. This, in fact, 
is the major hurdle that many pharmacotherapy methods face; 
noncompliance rates for psychotropic medications are very high, 
making even pharmacologically effective drugs a very imperfect 
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solution. 

Selective attrition can skew the average level of patient 
functioning in some therapy conditions, giving them an unfair 
advantage . Other researchers seem to overlook how excluding 
certain cases could inflate the apparent success of a treatment. 
For example, Birmaher, Holder, Johnson and Kolko (1996) advocate 
methods they used with adolescents, just mentioning in passing 
that in addition to a 9% drop out rate, "22% [of the subjects] 
were removed from the study because they failed to improve or 
deteriorated." (p.2) 

4 . Many debate the appropriateness of particular control group 
conditions. The use of credible, active placebos in double-blind 
studies (in which a new treatment is compared to a persuasive 
placebo, and both the patient and doctor are "blind" to which is 
being administered) is ideal, but rare. Often clinicians are 
asked to rate their own patients' progress, obviously aware of 
the treatment received. Waiting-list and delayed treatment 
control groups provide baseline data about spontaneous remission 
and the effectiveness of informal methods of problem resolution, 
but fail to speak to the specific advantages of one particular 
mode of intervention over another. Use of credible, active 
placebo treatments is heeded to assure that participants are 
comparably motivated across treatment conditions, because client 
expectancy for improvement is known to influence outcome. 

5. While in research patients are prescreened for one pure 
diagnosis and excluded if complicating, high-risk factors are 
present, in clinic therapy, patients are not excluded if they 
fail to be ideal candidates for the treatment being contemplated. 
In real practice, the therapist is obliged to seek the sequence 
of interventions that v?ill reiach even the most difficult patient. 
Actual patients present multiple therapeutic challenges; many 
clinicians believe that the majority of their clients qualify for 
Axis II codiagnoses. 

Often, the ihclusioh/exicTusidri criteria of a study make it 
difficult for the practitioner to know the extent to which a 
particular patient in his or her practice resembles a patient who 
would have qualified for the study. Since few practitioners use 
structured diagnostic interviews or psychometric ihstrumehts to 
evaluate their patients, a study based on carefully diagnosed 
patients will have limited generalizability to situations in 
which diagnoses are not made reliably (Sperry et al . , 1996) 

6 . Research therapy f bcuses oh experimehter-def ihed outcomes 
(usually specific symptom reduction) , while real therapy 
emphasizes patient -determined objectives. Non-research 
psychotherapy is usually aimed at improvement in general 
functioning, and attainment of some global goals the patient 
values, rather than reduction of delimited symptoms. Removal of a 
specific symptom may constitute complete success to the 
researcher but fail to satisfy the typical psychotherapy 
consumer . 

7. Non-research therapy is not usually of a predetermined, 
scripted, fixed duration and format (even with managed care, 
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changes in plan are possible) , and is expected to be self- 
correcting; if one method isn't working, another is quickly tried 
and assessed. Much of the talent of skilled therapists lies in 
their capacity to individualize the treatment process, flexibly 
drawing upon various methods that they tailor to the specific and 
shifting needs of an individual patient. Real therapists are 
responsible for avoiding untoward reactions, often without the 
reassurance of a "safety net", and luckily are not constrained by 
rigid protocols in so doing. 

Standardization of treatment creates artificialities that 
limit the generalizability of research findings to actual 
clinical settings. Drug trials allow extrapolations about how 
patients will generally respond to a proffered treatment, because 
delivery of a particular dosage of medication varies little 
across practitioners. In contrast, psychotherapists' 
personalities define the quality of their treatments; the 
effectiveness of an interpersonal, educational psychotherapy 
intervention can be greatly influenced by its manner of 
presentation. Standardizing treatment in order to reduce this 
inter-clinician variability can create a stilted process very 
unlike that which occurs in actual therapy. Reducing the 
spontaneity of clinicians' responses to an inherently 
unpredictable process that demands flexible, spontaneous 
responding probably reduces the power of the treatment process . 
More structured treatment methods may fare better in these 
comparative studies in part because less of these methods gets 
lost in the translation. Attempts to standardize less rigidly 
structured treatment methods may create a more awkward situation 
for therapists; the resulting approach may seem more confusing 
and tentative to clients. This awkwardness may detract from these 
treatments' measured effectiveness, leading us to underestimate 
their actual impact when delivered in vivo. In actual practice, 
therapists familiar with these less rigidly structured techniques 
may present them with a confidence and flair that is absent in 
the controlled investigations. Adherence to constraining therapy 
manuals may therefore make the research version of many therapies 
less effective than the real thing. 

8. Individual therapist effects are confounded with treatment 
effects, because although patients may be randomly assigned to 
treatment conditions, therapists rarely are. Consequently, 
outcome findings reflect therapist-by-treatment interactions, 
however therapist characteristics are usually neither well 
controlled nor well described. As typically implemented, outcome 
research design does not permit the attribution of outcome 
findings to treatment effects alone. 

When employed, the strategy of using therapists as their own 
controls across different therapy modalities also limits external 
validity, because this practice assures that many therapies in 
research studies are being delivered by less than fully 
enthusiastic proponents. This could dilute the observed 
effectiveness of all therapies across the board, assuming that 
therapists' loyalties are evenly distributed across the different 
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schools of therapy. 

9. Bias caused by the researcher's commitment to a particular 
therapeutic approach may also distort findings. Reviews have 
found that therapies preferred by the investigators tend to yield 
better outcomes than other treatment modalities being used for 
comparison (Robinson et al . , 1990) . More time and energy may be 
lavished on perfecting the favored treatment method, and training 
therapists in its use. Frequently the outcome measures used are 
indirect and subject to experimenter bias. Self reports from 
clients and therapists' ratings of overall improvement are 
susceptible to contamination, yet the majority of outcome studies 
use such measures . 

10. Use of inexperienced therapists also plagues many studies, 
further limiting their generalizability . Arguably, such studies 
discern the best treatment methods for use by inexperienced 
therapists . They do not really provide answers about which 
therapeutic techniques afford the greatest potential for 
mobilizing change when practiced after years of experience. 

Treatments don't work by themselves. Those pushing for 
universal application of ESTs seem to overlook therapist factors, 
which seem to be most important in more complex, difficult cases. 
Particular treatments are not "effective" in and of themselves. 
They have the potential to be effective when provided sensitively 
by competent practitioners, who objectively monitor the effects 
of their work, and revise their approach as feedback indicates. 
Many believe this "art" component of clinical practice will 
continue to be important. 

The psychotherapy process literature contains many examples 
showing the difficulty in training even experienced therapists to 
practice a given therapeutic approach correctly. Adherence to 
technique or treatment fidelity is a problem that has marred many 
efficacy investigations. For those more experienced with 
traditional treatment, conducted over a long period of time, the 
switch to working with highly focused treatment plans and ESTs is 
often very difficult. 

Research may also underestimate actual therapy's 
effectiveness because the closely monitored, supervised therapy 
being provided in many efficacy studies may be marred by the 
therapist's elevated self consciousness. Whose best treatment 
moments occur during videotaped training sessions? Many 
therapists find that close supervision can be distracting for 
them and detract from their clinical effectiveness. Some even 
rationalize that doing more poorly during episodes of closely 
monitored therapy may actually be a sign of a good therapist; 
anyone immune to this interference in these situations would 
probably have to be so insensitive to evaluation apprehension 
that they would be unable to empathize sufficiently with their 
anxious patients I 

11. Various measurement problems probably limit the 
meaningfulness of outcome research. Use of unreliable, 
insensitive, or irrelevant outcome measures potentially acts to 
underestimate treatment effect sizes. On the other hand, 
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regression effects may produce overestimates of therapy's 
effectiveness. Since there tends to be movement from improbable 
states to probable states, if clients begin treatment in extreme 
distress, the odds are that over time they will show lower levels 
of distress, and that the worst clients will show the most gains 
from treatment. When selective attrition unbalances different 
treatment groups in terms of their initial symptom severity, it 
is virtually impossible to rule out the contaminating influence 
of regression effects. 

The criterion used universally to assess the efficacy of 
therapy has been a reduction or cessation of symptoms in the 
patient . Symptomatic improvement can be discerned through 
behavioral indicators, patients' self-reported experience, 
therapists' rating, or assessment of psychological functioning by 
"blind" observers. Ratings by members of the patient's home 
environment have been used in studies evaluating treatments for 
children or hospitalized adults, but only rarely in cases where 
psychological treatments are applied to outpatient adult samples. 
While it might be very valuable to get objective information 
about the patient's functioning at work or in their home 
community, the requirement of confidentiality precludes the use 
of such potentially disruptive measures in most cases. 

Speed of improvement, unpleasant side effects, and emotional 
and financial costs should also be considered when comparing 
psychotherapeutic treatments. Research should focus on efficiency 
rather than just effectiveness. Howard et al . (1997) recommend 

future effect size calculations that take into account cost 
{e.g., rate of improvement) as well as amount of benefits. 

12 . Patient and therapist expectations may taint many of our 
estimates of outcome. Interactions between treatment features and 
outcome measures can introduce considerable distortion into the 
treatment comparison process. Treatments that foster clearer 
demand characteristics may appear superior if the dependent 
measures transparently assess the types of changes the therapist 
has been explicitly advocating during treatment. Certain 
treatments may, in effect, give patients the right answers, 
inflating the treatments' apparent efficacy, independent of the 
treatments' actual impact on the ensuing quality of the patient's 
life. For example, when a cognitive therapist indoctrinates the 
patient in how to think more rationally, and then a researcher 
inquires about the patient's habits of thought, the possibility 
exists that the patient is giving the answers they have been 
taught, without necessarily successfully deploying this new 
cognitive style on a regular basis. 

13. Research often focuses on statistical rather than clinical 
significance. Statistical significance is a joint function of 
effect size and sample size; when samples are large enough, 
trivial group differences can emerge as statistically noteworthy. 
For example, in comparing two weight loss treatments, a 
difference between groups of one ounce per month would be 
practically irrelevant, but might be sufficiently improbable to 
pass our statistical tests of significance. The statistical 
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conclusion conveys little or nothing about clinical or "real 
world" significance. Statistical significance indicates merely 
that it is likely that an observed trend did not arise on a 
chance basis. 

According to Trull, Nietzel and Main {1988), psychotherapy 
effectiveness has traditionally been assessed through studies 
that tend to neglect the issue of clinical significance, or the 
extent to which clinical outcomes represent a meaningful 
magnitude of change (Nietzel, Russell, Heramings, & Gretter, 

1987) . Clinical significance involves the extent to which therapy 
has made a desired and meaningful difference in patients' lives. 

Those who confuse practical, clinical significance with 
statistical significance can easily be misled by outcome 
research. It makes little sense to champion a "statistically 
superior" treatment if its advantage for the -patient is 
inconsequential (for example, an average score that is 1 point 
lower on a 100 point self report scale) . Jacobson (1995) found 
that when the results of treatment studies are examined in terms 
of their clinical significance, the conclusions can be 
disturbing. In the NIMH TDCRP study (Elkin et al . , 1989), the 
proportion of clients who completed the 12-week, 20-session 
treatment, recovered from their depressive episode, and stayed 
nondepressed for 18 months ranged from 19% to 32% across the 
three active treatments (imipramine, cognitive therapy and 
interpersonal psychotherapy) . Only a minority of patients 
recovered and stayed recovered for more than a year, and the 
placebo treatment was comparably effective (20%) . Nothing 
produced lasting recovery for the majority of cases. 

Jacobson points out that "the TDCRP is widely considered to 
have achieved the highest degree of methodological rigor of any 
large-scale outcome study yet conducted, and thus has produced 
results that are more believable than those from many other 
trials of dubious design quality. These findings are not 
atypical, either for major depression or for other mental health 
problems. In a series of studies of clinical significance, our 
research group has examined conduct disorders in adolescents, 
couples seeking therapy for marital distress, and people with 
anxiety disorders. We have found the recovered patient (the one 
who shows few or no signs of symptoms of the initial complaint 
and believes him- or herself to be "cured") to be the exception 
rather than the rule for every type of disorder examined and for 
every type of therapy that we have looked at- psychodynamic, 
behavioral, cognitive and family therapy. When one considers even 
more intractable problems, such as addictive behaviors, 
schizophrenia and personality disorders, the clinical 
significance data are even more bleak. The only exception we have 
found thus far to these modest recoveiy rates is the cognitive 
behavioral treatment of panic disorder, developed by David Clark 
at Oxford University and David Barlow at the State University of 
New York in Albany." (Jacobson, 1995, p.37) 

This does not mean that psychotherapy never produces 
recovery or that therapies are. not differentially effective. It 
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simply indicates that psychotherapy generally yields rather 
modest recovery rates. There is no particular treatment modality 
that is uniquely subject to this criticism; most therapies 
examined seem somewhat deficient in terms of clinical 
significance . 

14. A final problem in applying efficacy research involves the 
leap clinicians must make from normative findings to their 
individual work with individual therapy cases. What is true in 
general may not necessarily be true in particular cases . One of 
the first things every student of statistics is taught is that 
you can't generalize from group findings to individual cases if 
there is substantial within-group variability. Men may on average 
be taller than women, but this does not hold true in all cases . 
Yet as practitioner- scientists therapists are routinely asked to 
draw upon normative findings in shaping treatment for individual 
cases. 

Some argue that global assertions about psychotherapy's 
efficacy are meaningless, because even though research shows that 
psychotherapy on average has a positive effect, we cannot infer 
from it that a particular treatment will work for a particular 
patient treated by a particular therapist . According to Robyn 
Dawes (1994) , it is important to remember that normative data on 
psychotherapy's general effectiveness does not offer proof of its 
effectiveness in every individual case. As Dawes puts it "success 
in therapy is far from assured, even though it works overall in a 
statistical sense. Someone who is dissatisfied with their current 
progress in therapy should not be inhibited about changing 
therapists or mode of treatment. (The therapist that is abandoned 
may attribute this decision to the depth of the client's 
pathology, but so what.) ...Statements from professionals that 
they "know" much better than the client what is "needed" may 
often best be politely ignored- -especially when these statements 
are made after minimal contact, followed by a standard diagnostic 
label. If verbal therapy is sought, find someone empathetic. 
Unfortunately, I have no good advice about how to judge whether 
someone is empathetic before getting to know that person." (p.73- 
74) 

High variability within treatment groups makes it difficult 
to know how to apply research findings in individual cases. Data 
from efficacy studies shows considerable variation among patients 
assigned to the same group, indicating how differently individual 
patients respond to the same treatment . When within-group 
variation exceeds measurement error, there are reliable 
differences among patients within treatments. Effect sizes are 
seldom large enough to eliminate overlap between the outcomes of 
patients in the treatment groups being compared (Howard, Krause, 

& Vessey, 1994) . Accordingly, there are almost always some 
patients in the inferior treatment that achieve better outcomes 
than some patients in the superior treatment, and vice versa. 

Actual clinical practice is individual, case by case work, 
with an N of 1 . At the same time that we must exercise caution in 
inappropriately extrapolating from normative research to 
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individual cases, we must also be wary of the tendency to believe 
in the necessary superiority of therapists' individualized, 
intuitive understanding of clients. Research by Dawes, Meehl, and 
others suggests that the impression that individualized 
understandings are more accurate than broad generalizations based 
on established principles is often illusory (Dawes, 1994) . 

Real therapists must also guard against confirmation bias, 
or failure to check on the accuracy of our beliefs about patients 
by obtaining relevant ancillary information, and to realize that 
patients may also make this type of error. Successful clients may 
attribute their improvement to treatment, because there is no way 
for them to evaluate what would have happened without treatment . 
There is considerable ambiguity in actual clinical practice. 
Coping with this lack of certainty and applying principles 
established on a probabilistic basis are challenges all 
psychotherapists must continually face . 

Estimating Clinical Representativeness of ESTs 

One way of estimating the magnitude of this generalization 
problem is to consider the few studies which have explicitly 
compared the treatment effect sizes of experimental studies 
conducted in actual clinical settings versus those performed in 
university contexts. Smith, Glass, and Miller (1980) found that 
effect sizes from studies conducted in clinical settings (mental 
health centers, d = .47; other outpatient facilities, d = .79) 
were smaller than those from college-based studies (d = 1.04) . 
Consistent with this are the findings of the Weisz et al . (1992) 

review of four meta- analyses based on more than twelve thousand 
children and adolescents who participated in more than two 
hundred controlled studies. Weisz et al . (1992) concluded that 

while overall children and adolescents benefitted from treatment, 
effects were more modest among the more representative clinic 
studies. In fact, most of the six studies of referred patients 
treated in clinics for more general psychopathology they reviewed 
did not show significant treatment effects. 

On the other hand, several researchers have failed to find 
statistically significant differences between studies conducted 
in laboratory and clinical settings (Jorm, 1989, Shapiro & 
Shapiro, 1982; Steinbnieck, Maxwell, & Howard, 1983) . In a very 
ambitious investigation, Shadish et al . (1997) selectively 

reviewed 56 non-university outcome studies whose conditions most 
closely mimicked those of actual practice. They found that effect 
sizes from the more representative clinic therapy studies (d = 
.54, based on 46 Stage I studies) were generally comparable to 
those from the studies comprising the original meta-analyses they 
used (d = .59) . 

In contrast, an evaluation of stuttering treatment studies 
(Andrews & Harvey, 1981) found larger effect sizes from clinic 
studies (outpatient settings: d = .76; inpatient settings; d = 
1.00), than from university-based studies (d = .67) . These last 
results, favoring treatment provided in actual clinical settings, 
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are in accord with the findings of the 1995 Consumer Reports (CR) 
naturalistic, effectiveness study, which revealed "90% of 
[psychotherapy] patients doing well, in contrast to efficacy 
studies which are usually in the 65% range" (Seligman & Levant, 
1998, p.212),. 

Seligman and Levant highlight several conclusions from the 
CR study: "long terro therapy worked much better than short terro 
therapy; no particular modality of therapy or medication exceeded 
any other for any disorder; and insurance limits on choice and 
duration of therapy predicted worse outcome. These conclusions 
are notable because each contradicts what is often found by the 
efficacy method and suggests that the outcome of therapy "in the 
field" may be quite different from findings of laboratory 
efficacy studies." (p.212) 



Summary 

While the use of ESTs to guide clinical practice is 
advantageous in many respects, awareness of the limitations of 
the efficacy studies the EST list is based upon may help to 
prevent overly simplistic thinking about treatment standards. 
Balanced consideration of the findings from these studies will 
hopefully limit inappropriate generalizations . Integrating the 
findings from naturalistic studies can also foster more accurate 
understanding of the challenges that confront "real life" 
clinicians . 
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